Visualizing the Philips Curve

Visualizing the Philips Curve

The Philips Curve was initially discovered as a statistical relationship between unemployment and inflation. The original version used historical US data.

Philips Curve

Our goal here is to visually inspect the Philips curve using recent data, for several countries.

In the process we will learn to: - import dataframes, inspect them, merge them, clean the resulting data - use matplotlib to create graphs - bonus: experiment with other plotting libraries

Importing the Data

We start by loading library dbnomics. It is installed on the Nuvolos servers.

import dbnomics

The following code imports data for from dbnomics for a few countries.

table_1 = dbnomics.fetch_series([
    "OECD/DP_LIVE/FRA.CPI.TOT.AGRWTH.Q",
    "OECD/DP_LIVE/GBR.CPI.TOT.AGRWTH.Q",
    "OECD/DP_LIVE/USA.CPI.TOT.AGRWTH.Q",
    "OECD/DP_LIVE/DEU.CPI.TOT.AGRWTH.Q"
])
table_2 = dbnomics.fetch_series([
    "OECD/MEI/DEU.LRUNTTTT.STSA.Q",
    "OECD/MEI/FRA.LRUNTTTT.STSA.Q",
    "OECD/MEI/USA.LRUNTTTT.STSA.Q",
    "OECD/MEI/GBR.LRUNTTTT.STSA.Q"
])

Describe concisely the data that has been imported (periodicity, type of measure, …). You can either check dbnomics website or look at the databases.

  • table_1:
    • growth of CPI index. The total of all goods. Quarterly data for France, UK, USA, Germany
  • table_2
    • unemployment rate coming from LO

Show the first rows of each database. Make a list of all columns.

# first table
table_1.head(3)
@frequency provider_code dataset_code dataset_name series_code series_name original_period period original_value value LOCATION INDICATOR SUBJECT MEASURE FREQUENCY Country Indicator Subject Measure Frequency
0 quarterly OECD DP_LIVE OECD Data Live dataset FRA.CPI.TOT.AGRWTH.Q France – Inflation (CPI) – Total – Annual grow... 1956-Q1 1956-01-01 1.746324 1.746324 FRA CPI TOT AGRWTH Q France Inflation (CPI) Total Annual growth rate (%) Quarterly
1 quarterly OECD DP_LIVE OECD Data Live dataset FRA.CPI.TOT.AGRWTH.Q France – Inflation (CPI) – Total – Annual grow... 1956-Q2 1956-04-01 1.838658 1.838658 FRA CPI TOT AGRWTH Q France Inflation (CPI) Total Annual growth rate (%) Quarterly
2 quarterly OECD DP_LIVE OECD Data Live dataset FRA.CPI.TOT.AGRWTH.Q France – Inflation (CPI) – Total – Annual grow... 1956-Q3 1956-07-01 2.670692 2.670692 FRA CPI TOT AGRWTH Q France Inflation (CPI) Total Annual growth rate (%) Quarterly
# to get the list of columns:
table_1.columns
Index(['@frequency', 'provider_code', 'dataset_code', 'dataset_name',
       'series_code', 'series_name', 'original_period', 'period',
       'original_value', 'value', 'LOCATION', 'INDICATOR', 'SUBJECT',
       'MEASURE', 'FREQUENCY', 'Country', 'Indicator', 'Subject', 'Measure',
       'Frequency'],
      dtype='object')
# second table
table_2.head(3)
@frequency provider_code dataset_code dataset_name series_code series_name original_period period original_value value LOCATION SUBJECT MEASURE FREQUENCY Country Subject Measure Frequency
0 quarterly OECD MEI Main Economic Indicators Publication DEU.LRUNTTTT.STSA.Q Germany – Labour Force Survey - quarterly rate... 1962-Q1 1962-01-01 0.442249 0.442249 DEU LRUNTTTT STSA Q Germany Labour Force Survey - quarterly rates > Unempl... Level, rate or national currency, s.a. Quarterly
1 quarterly OECD MEI Main Economic Indicators Publication DEU.LRUNTTTT.STSA.Q Germany – Labour Force Survey - quarterly rate... 1962-Q2 1962-04-01 0.444882 0.444882 DEU LRUNTTTT STSA Q Germany Labour Force Survey - quarterly rates > Unempl... Level, rate or national currency, s.a. Quarterly
2 quarterly OECD MEI Main Economic Indicators Publication DEU.LRUNTTTT.STSA.Q Germany – Labour Force Survey - quarterly rate... 1962-Q3 1962-07-01 0.450347 0.450347 DEU LRUNTTTT STSA Q Germany Labour Force Survey - quarterly rates > Unempl... Level, rate or national currency, s.a. Quarterly
# to get the list of columns:
table_2.columns
Index(['@frequency', 'provider_code', 'dataset_code', 'dataset_name',
       'series_code', 'series_name', 'original_period', 'period',
       'original_value', 'value', 'LOCATION', 'SUBJECT', 'MEASURE',
       'FREQUENCY', 'Country', 'Subject', 'Measure', 'Frequency'],
      dtype='object')

Compute averages and standard deviations for unemployment and inflation, per country.

# reminder: to get averages for a whole table
# we extract the relevant column with table_1['value']
print("Mean Inflation: ", table_1['value'].mean())
print("Mean Unemployment: ", table_2['value'].mean())
Mean Inflation:  3.8910539919538323
Mean Unemployment:  6.098550447670419
# reminder: to get averages for a whole s
print("Std. Deviation Inflation: ", table_1['value'].std())
print("Std. Deviation Unemployment: ", table_2['value'].std())
Std. Deviation Inflation:  3.5916466190299996
Std. Deviation Unemployment:  2.550612600349251

There are two strategies to compute statistics per group.

# option 1: by using pandas boolean selection 

The first approach consists in keeping only the observations for a given country:

table_1['Country'].unique()
array(['France', 'United Kingdom', 'United States', 'Germany'],
      dtype=object)
country = "France"
is_france = (table_1['Country']==country)
table_1_france = table_1[is_france]
table_1_france['Country'].value_counts() # check 
Country
France    271
Name: count, dtype: int64
# the preceding code can be replicated more concisely:
table_1_france = table_1[table_1['Country']=="France"]
# or using a database query:
table_1_france = table_1.query("Country=='France'")
# we can now compute the statistics:
print("Mean Inflation: ", table_1_france['value'].mean())
print("Std. Deviation Inflation: ", table_1_france['value'].std())
Mean Inflation:  4.218004559985239
Std. Deviation Inflation:  3.85319032016491
# Here is a code to do it for all countries:

for country in table_1['Country'].unique():
    print(f'Statistics for: {country}')

    # inflation
    table_1_country = table_1[table_1['Country']==country]
    print("* Mean Inflation: ", table_1_country['value'].mean())
    print("* Std. Deviation Inflation: ", table_1_country['value'].std())

    # unemployment
    table_2_country = table_2[table_2['Country']==country]
    print("* Mean Inflation: ", table_2_country['value'].mean())
    print("* Std. Deviation Inflation: ", table_2_country['value'].std())
    print() # add empty line
Statistics for: France
* Mean Inflation:  4.218004559985239
* Std. Deviation Inflation:  3.85319032016491
* Mean Inflation:  8.680560014887725
* Std. Deviation Inflation:  0.9840354763739427

Statistics for: United Kingdom
* Mean Inflation:  5.003996036162361
* Std. Deviation Inflation:  4.768592756710978
* Mean Inflation:  6.70511374527728
* Std. Deviation Inflation:  2.388539226072565

Statistics for: United States
* Mean Inflation:  3.678550917822878
* Std. Deviation Inflation:  2.7795046092217035
* Mean Inflation:  5.851086956521739
* Std. Deviation Inflation:  1.667886470619543

Statistics for: Germany
* Mean Inflation:  2.659118566666667
* Std. Deviation Inflation:  1.8661314171203747
* Mean Inflation:  4.989272203471847
* Std. Deviation Inflation:  3.0671989448416817
# option 2: by using groupby

Many statistics can be computed by group.

print("Mean Inflation")
table_1.groupby('Country')['value'].agg('mean')
Mean Inflation
Country
France            4.218005
Germany           2.659119
United Kingdom    5.003996
United States     3.678551
Name: value, dtype: float64
table_1.columns
Index(['@frequency', 'provider_code', 'dataset_code', 'dataset_name',
       'series_code', 'series_name', 'original_period', 'period',
       'original_value', 'value', 'LOCATION', 'INDICATOR', 'SUBJECT',
       'MEASURE', 'FREQUENCY', 'Country', 'Indicator', 'Subject', 'Measure',
       'Frequency'],
      dtype='object')
print("Std Inflation")
table_1.groupby('Country')['value'].agg('std')
Std Inflation
Country
France            3.853190
Germany           1.866131
United Kingdom    4.768593
United States     2.779505
Name: value, dtype: float64
# This can be done for several stats at the same time:

print("Statistics for Inflation")
table_1.groupby('Country')['value'].agg(['mean','std'])
Statistics for Inflation
mean std
Country
France 4.218005 3.853190
Germany 2.659119 1.866131
United Kingdom 5.003996 4.768593
United States 3.678551 2.779505
# Same stats for Unemployment.

print("Statistics for Unemployment")
table_2.groupby('Country')['value'].agg(['mean','std'])
Statistics for Unemployment
mean std
Country
France 8.680560 0.984035
Germany 4.989272 3.067199
United Kingdom 6.705114 2.388539
United States 5.851087 1.667886

Comment: we observe differences in the level of structural unemployment. Volatility in Germany is rather impressive.

The following command merges the two databases together. Explain the role of argument on. What happened to the column names?

table = table_1.merge(table_2, on=["period", 'Country']) 

The on argument indicate, which columns identify a unique observation. Here the date and the country denote the same observation in both countries. However the column value of the first and the second database have a different meaning (resp unemployment and inflation). To distinguish them, they receive a suffix (_x and _y respectively).

We rename the new names for the sake of clarity and normalize everything with lower cases.

table = table.rename(columns={
    'period': 'date',         # because it sounds more natural
    'Country': 'country',
    'value_x': 'inflation',
    'value_y': 'unemployment'
})

On the merged table, compute at once all the statistics computed before (use groupby and agg).

table[['country','inflation', 'unemployment']].groupby('country').agg( [ 'mean', 'std'])
inflation unemployment
mean std mean std
country
France 1.664349 1.381470 8.680560 0.984035
Germany 2.730136 1.918303 4.989272 3.067199
United Kingdom 5.404707 5.253143 6.705114 2.388539
United States 3.678551 2.779505 5.880812 1.668049

Before we process further, we should tidy the dataframe by keeping only what we need. - Keep only the columns date, country, inflation and unemployment - Drop all na values - Make a copy of the result

df = table[['date', 'country', 'inflation', 'unemployment']].dropna()
df = df.copy()
# note: the copy() function is here to avoid keeping references to the original database
print(df.shape)
(811, 4)

What is the maximum availability interval for each country? How would you proceed to keep only those dates where all datas are available? In the following we keep the f

# optional: here is a quick way to visualize availability dates
import altair as alt
alt.Chart(df).mark_point().encode(x='date', y='country', color='country')
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
cdf = df.dropna() # remove all lines with non available values
cdf.shape # compare with the cell above: there was no line with na.
(811, 4)
# the following computes all dates where data is available in all 4 countries
all_available = cdf.groupby("date")['inflation'].count() == 4
all_available
date
1956-01-01    False
1956-04-01    False
1956-07-01    False
1956-10-01    False
1957-01-01    False
              ...  
2022-07-01     True
2022-10-01     True
2023-01-01     True
2023-04-01     True
2023-07-01     True
Name: inflation, Length: 271, dtype: bool
# keep only dates
common_dates = all_available[all_available].index
common_dates
DatetimeIndex(['2003-01-01', '2003-04-01', '2003-07-01', '2003-10-01',
               '2004-01-01', '2004-04-01', '2004-07-01', '2004-10-01',
               '2005-01-01', '2005-04-01', '2005-07-01', '2005-10-01',
               '2006-01-01', '2006-04-01', '2006-07-01', '2006-10-01',
               '2007-01-01', '2007-04-01', '2007-07-01', '2007-10-01',
               '2008-01-01', '2008-04-01', '2008-07-01', '2008-10-01',
               '2009-01-01', '2009-04-01', '2009-07-01', '2009-10-01',
               '2010-01-01', '2010-04-01', '2010-07-01', '2010-10-01',
               '2011-01-01', '2011-04-01', '2011-07-01', '2011-10-01',
               '2012-01-01', '2012-04-01', '2012-07-01', '2012-10-01',
               '2013-01-01', '2013-04-01', '2013-07-01', '2013-10-01',
               '2014-01-01', '2014-04-01', '2014-07-01', '2014-10-01',
               '2015-01-01', '2015-04-01', '2015-07-01', '2015-10-01',
               '2016-01-01', '2016-04-01', '2016-07-01', '2016-10-01',
               '2017-01-01', '2017-04-01', '2017-07-01', '2017-10-01',
               '2018-01-01', '2018-04-01', '2018-07-01', '2018-10-01',
               '2019-01-01', '2019-04-01', '2019-07-01', '2019-10-01',
               '2020-01-01', '2020-04-01', '2020-07-01', '2020-10-01',
               '2021-01-01', '2021-04-01', '2021-07-01', '2021-10-01',
               '2022-01-01', '2022-04-01', '2022-07-01', '2022-10-01',
               '2023-01-01', '2023-04-01', '2023-07-01'],
              dtype='datetime64[ns]', name='date', freq=None)
# This can be done for several stats at the same time:
cdf[ cdf['date'].isin(common_dates) ] 
date country inflation unemployment
0 2003-01-01 France 2.366263 7.922234
1 2003-04-01 France 1.912854 8.089598
2 2003-07-01 France 1.932270 8.036090
3 2003-10-01 France 2.184437 8.349410
4 2004-01-01 France 1.800087 8.518631
... ... ... ... ...
807 2022-07-01 Germany 7.402639 3.181081
808 2022-10-01 Germany 8.580543 3.059473
809 2023-01-01 Germany 8.236768 2.961556
810 2023-04-01 Germany 6.546894 2.963810
811 2023-07-01 Germany 5.603836 2.970260

332 rows × 4 columns

Our DataFrame is now ready for further analysis !

# note: 
# the following code also works when dataframe is not cylindric
# df = cdf # uncomment if you want to work on cylindric data

Plotting using matplotlib

Our goal now consists in plotting inflation against unemployment to see whether a pattern emerges. We will first work on France.

from matplotlib import pyplot as plt

Create a database df_fr which contains only the data for France.

df_fr = df[ df['country'] == 'France' ].copy() # again, we copy the result, because we plan to modify it

The following command create a line plot for inflation against unemployment. Can you transform it into a scatterplot ?

plt.plot(df_fr['unemployment'], df_fr['inflation'], 'o')

Expand the above command to make the plot nicer (label, title, grid, …)

plt.plot(df_fr['unemployment'], df_fr['inflation'],'.')
plt.grid()
plt.xlabel("Unemployment (%)")
plt.ylabel("Inflation (%)")
plt.title("Phillips Curve")
Text(0.5, 1.0, 'Phillips Curve')

Visualizing the regression

The following piece of code regresses inflation on unemployment.

from statsmodels.formula import api as sm
model = sm.ols(formula='inflation ~ unemployment', data=df_fr)
result = model.fit()

We can use the resulting model to “predict” inflation from unemployment.

result.predict(df_fr['unemployment'])
0     2.366810
1     2.211777
2     2.261342
3     1.971104
4     1.814349
        ...   
78    3.064908
79    3.055976
80    3.141252
81    2.926298
82    2.888369
Length: 83, dtype: float64

Store the result in df_fr as reg_unemployment

df_fr['pred_inflation'] = result.predict(df_fr['unemployment'])

By expanding again, the command above to make a plot, add the regression line to the scatter plot.

plt.plot(df_fr['unemployment'], df_fr['inflation'],'o', alpha=0.5)
plt.plot(df_fr['unemployment'], df_fr['pred_inflation'], color='C0')
plt.title("Philips Curve")
plt.xlabel("Unemployment (%)")
plt.ylabel("Inflation")
plt.grid()

Now we would like to compare all countries. Can you find a way to represent the data for all of them (all on one graph, using subplots…) ?

# this solution uses loops and iterators but the same can be done manually
countries = df['country'].unique()
df_countries = []
for c in countries:
    
    tmp = df[df['country']==c].copy()
    
    model = sm.ols(formula='inflation ~ unemployment', data=tmp)
    result = model.fit()
    
    tmp['pred_inflation'] = result.predict(tmp['unemployment'])

    df_countries.append(tmp)
# all on one graph
for i, (d, c) in enumerate(zip(df_countries, countries)):
    plt.plot(d['unemployment'], d['inflation'], 'o', alpha=0.2, label=c, color=f"C{i}") # missing 'o'
    plt.plot(d['unemployment'], d['pred_inflation'],   color=f"C{i}") # missing 'o'
plt.legend(loc='upper right')

# using subplots
# all on one graph
fig = plt.subplots(2,2)
for i, (d, c) in enumerate(zip(df_countries, countries)):
    plt.subplot(2,2,i+1)
    plt.plot(d['unemployment'], d['inflation'], 'o', alpha=0.2, color=f"C{i}") # missing 'o'
    plt.plot(d['unemployment'], d['pred_inflation'],   color=f"C{i}") # missing 'o'
    plt.xlabel("Unemployment")
    plt.ylabel("Inflation")
    plt.title(c)
    plt.tight_layout()

Any comment on these results?

  • First it would be nice to break the period into many subperiods. We know the story about the disappearnce of the Philips curve in the US (and the Lucas Critique).
  • Second, there seems to be a strong contrast between US/UK and Germany/France. If one remembers the mechanisms behind the Philips curve it is not surprising: prices and wages are more rigid in Europe.
    • In the case of Germany, the conclusion is certainly not too strong: there has been a lot of volatility.
    • In the case of France, the time span of the time series is much smaller which also weakens the conclusion.

Bonus: Visualizing data using altair

Altair is a visualization library (based on Vega-lite) which offers a different syntax to make plots.

It is well adapted to the exploration phase, as it can operate on a full database (without splitting it like we did for matplotlib). It also provides some data transformation tools like regressions, and ways to add some interactivity.

import altair as alt

The following command makes a basic plot from the dataframe df which contains all the countries. Can you enhance it by providing a title and encoding information to distinguish the various countries?

chart = alt.Chart(df).mark_point().encode(
    x='unemployment',
    y='inflation',
    # find a way to separate on the graph data from France and US
).interactive()
chart
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
# solution: 
chart = alt.Chart(df, title="Data for all countries").mark_point().encode(
    x='unemployment',
    y='inflation',
    color="country",
)
chart
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)

The following graph plots a regression line, but for all countries, it is rather meaningless. Can you restrict the data to France only?

chart = alt.Chart(df).mark_point().encode(
    x='unemployment',
    y='inflation',
    # find a way to separate on the graph data from France and US
)
chart + chart.transform_regression('unemployment', 'inflation').mark_line()
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)
# solution (it is also possible to replace df by df_fr...)
chart = alt.Chart(df).transform_filter('datum.country=="France"').mark_point().encode(
    x='unemployment',
    y='inflation',
    # find a way to separate on the graph data from France and US
)
chart + chart.transform_regression('unemployment', 'inflation').mark_line()
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)

One way to visualize data consists in adding some interactivity. Add some title and click on the legend

multi = alt.selection_multi(fields=["country"])

legend = alt.Chart(df).mark_point().encode(
    y=alt.Y('country:N', axis=alt.Axis(orient='right')),
    color=alt.condition(multi, 'country:N', alt.value('lightgray'), legend=None)
).add_selection(multi)

chart_2 = alt.Chart(df).mark_point().encode(
    x='unemployment',
    y='inflation',
    color=alt.condition(multi, 'country:N', alt.value('lightgray')),
    # find a way to separate on the graph data from France and US
)

# Try to click on the legend
chart_2 | legend
/opt/conda/lib/python3.10/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning: 'selection_multi' is deprecated.  Use 'selection_point'
  warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
/opt/conda/lib/python3.10/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning: 'add_selection' is deprecated. Use 'add_params' instead.
  warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)

Bonus question: in the following graph you can select an interval in the left panel to select some subsample. Can you add the regression line(s) corresponding to the selected data to the last graph?

brush = alt.selection_interval(encodings=['x'],)

historical_chart_1 = alt.Chart(df).mark_line().encode(
    x='date',
    y='unemployment',
    color='country'
).add_selection(
    brush
)
historical_chart_2 = alt.Chart(df).mark_line().encode(
    x='date',
    y='inflation',
    color='country'
)
chart = alt.Chart(df).mark_point().encode(
    x='unemployment',
    y='inflation',
    # find a way to separate on the graph data from France and US
    color=alt.condition(brush, 'country:N', alt.value('lightgray'))
)
alt.hconcat(historical_chart_1, historical_chart_2, chart,)
/opt/conda/lib/python3.10/site-packages/altair/utils/deprecation.py:65: AltairDeprecationWarning: 'add_selection' is deprecated. Use 'add_params' instead.
  warnings.warn(message, AltairDeprecationWarning, stacklevel=1)
/opt/conda/lib/python3.10/site-packages/altair/utils/core.py:395: FutureWarning: the convert_dtype parameter is deprecated and will be removed in a future version.  Do ``ser.astype(object).apply()`` instead if you want ``convert_dtype=False``.
  col = df[col_name].apply(to_list_if_array, convert_dtype=False)

Bonus 2: Plotly Express

Another popular option is the plotly library for nice-looking interactive plots. Combined with dash or shiny, it can be used to build very powerful interactive interfaces.

import plotly.express as px
fig = px.scatter(df, x='unemployment', y='inflation', color='country', title="Philips Curves")
fig
/opt/conda/lib/python3.10/site-packages/plotly/express/_core.py:1979: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  sf: grouped.get_group(s if len(s) > 1 else s[0])